Data normalization
Initialization
We start the analysis by initializing the packages required for all the analysis performed in this section. We also define the root directory, within which all the input/output operations for this project will be performed. At the end of this document, a detailed software version information is provided for easier reproducibility of the analysis.
library(DT)
library(tidyverse)
library(data.table)
library(WriteXLS)
library(ggrepel)
library(ggpubr)
library(patchwork)
library(pheatmap)
library(RColorBrewer)
path = "/Users/ashwin/Documents/Projects/YeastScreen/Essential_DAmP_screen/"Normalization methodology
We use two different normalization strategy to answer specific questions -
- For a given plate \(i\), the Cytoplasmic and Mitochondrial roGFP2 ratios are normalized by the median of all controls in the same plate i.e for cytoplasm - \[NormCyto(i) = \frac{Cyto_i} {median(Ctrl_i)}\] and for mitochondria, \[NormMito(i) = \frac{Mito_i} {median(Ctrl_i)}\]
This type of control based plate normalization will be suitable in comparing the redox levels in across organells and nutrients since the roGFP2 ratios has been commonly normalized to the plate control.
- For a given plate \(i\) with the Cytoplasmic and Mitochondrial roGFP2 ratios \(j\) are normalized by their respective plate specific median roGFP2 ratios i.e for cytoplasm - \[NormCyto_{i} = \frac{Cyto_{ij} - median(Cyto_{i})} {mad(Cyto_{i})}\] and for mitochondria, \[NormMito_{i} = \frac{Mito_{ij} - median(Mito_{i})} {mad(Mito_{i})}\]
This type of organelle specific normalization will be suitable in comparing the redox levels in each organelle across all plates. Since this type of normalization tells us how far are the values from the plate median.
Furthermore, for each normalization strategy, we also summarized the quadruplicated values per gene by taking the median of the scaled values. Due to our outlier removal strategy in the previous section, many NA values were introduced. Each mutant has 4 values, we removed mutants with 2 or more NA values and those with only one NA value was substituted with the median of the other 3 observed values. Next, there were also multiple copies of the same mutants (genes) in either the same plate or different plates. We summarized them by taking the set of quadruplicated or median summarized values which have the maximum absolute median value.
Eventually, we generate the following -
- A table with all the replicate values per organelle and nutrient conditions and the the median summarized value from all plates
- A matrix with gene/mutants in rows and all combinations of organelle, nutrient and replicates (and median summarized value) in the column
Below is the table of cleaned raw data that we will use in our analysis.
rawDatCleaned = readRDS(paste0(path, "data/workspaces/YeastMutantRedox_RawDataCleaned.RDS"))
rawDatCleaned_ctrlfilt = readRDS(paste0(path, "data/workspaces/YeastMutantRedox_RawData_Outliers_and_PoorControlFiltered.RDS"))
normalizeScreenData = function(normMethod)
{
if(normMethod == "fracControl"){
normDat = split(rawDatCleaned_ctrlfilt, as.character(rawDatCleaned_ctrlfilt$Type))
normDat = lapply(normDat, function(x) {
split(x, x$Plate)})
}
if(normMethod == "robustZ"){
normDat = split(rawDatCleaned, rawDatCleaned$Type)
normDat = lapply(normDat, function(x) {
split(x, x$Plate)})
}
normDatReps = normDat
for (i in 1:length(normDat))
{
for (j in 1:length(normDat[[i]]))
{
tmp = droplevels(normDat[[i]][[j]])
if(nrow(tmp) > 0) #since there are no entries from plate 150, was filtered in previous step
{
#---------------------------------------------------------------------------------------------
# Separate Controls, Cytoplasam and Mitochondrial roGFP2 ratios per plate
#---------------------------------------------------------------------------------------------
# Control
tmp.ctrl = droplevels(tmp[tmp$Content == "Control",])
# Cytoplasm
tmp.cyto = droplevels(tmp[tmp$Content == "Cytoplasm",])
# Mitochondria
tmp.mito = droplevels(tmp[tmp$Content == "Mitochondria",])
rm(tmp)
#---------------------------------------------------------------------------------------------
# Compute the corresponding normalization factors (median & median absolute deviation)
#---------------------------------------------------------------------------------------------
# Normalizing factor - Control
nf.ctrl.med = median(tmp.ctrl$roGFP2.ratio, na.rm = T)
# Normalizing factor - Cytoplasm
nf.cyto.med = median(tmp.cyto$roGFP2.ratio, na.rm = T)
nf.cyto.mad = mad(tmp.cyto$roGFP2.ratio, na.rm = T)
# Normalizing factor - Mitochondria
nf.mito.med = median(tmp.mito$roGFP2.ratio, na.rm = T)
nf.mito.mad = mad(tmp.mito$roGFP2.ratio, na.rm = T)
#---------------------------------------------------------------------------------------------
# The two normalization strategies (plate control and median based)
#---------------------------------------------------------------------------------------------
# Normalization 1 - Plate control based
if (normMethod == "fracControl")
{
tmp.cyto$roGFP2.ratio = tmp.cyto$roGFP2.ratio / nf.ctrl.med
tmp.mito$roGFP2.ratio = tmp.mito$roGFP2.ratio / nf.ctrl.med
}
# Normalization 2 - Plate median based
if (normMethod == "robustZ")
{
tmp.cyto$roGFP2.ratio = (tmp.cyto$roGFP2.ratio - nf.cyto.med) / nf.cyto.mad
tmp.mito$roGFP2.ratio = (tmp.mito$roGFP2.ratio - nf.mito.med) / nf.mito.mad
}
rm(nf.ctrl.med,
nf.cyto.med,
nf.cyto.mad,
nf.mito.med,
nf.mito.mad)
#---------------------------------------------------------------------------------------------
# Normalized data with replicates - coverting from long to wide table format
#---------------------------------------------------------------------------------------------
#---Cytoplasm---#
tmp.cyto.repl = tmp.cyto %>%
select(Gene.Symbol, Plate, Group, roGFP2.ratio, Type, Content) %>%
group_by(Gene.Symbol, Group) %>%
mutate(pseurep = paste0("roGFP2_ratio_", 1:n())) %>%
spread(key = pseurep, value = roGFP2.ratio) %>%
ungroup() %>%
select(-Group) %>%
rename(Genes = Gene.Symbol,
Nutrient = Type,
Organelle = Content)
#---Mitochondria---#
tmp.mito.repl = tmp.mito %>%
select(Gene.Symbol, Plate, Group, roGFP2.ratio, Type, Content) %>%
group_by(Gene.Symbol, Group) %>%
mutate(pseurep = paste0("roGFP2_ratio_", 1:n())) %>%
spread(key = pseurep, value = roGFP2.ratio) %>%
ungroup() %>%
select(-Group) %>%
rename(Genes = Gene.Symbol,
Nutrient = Type,
Organelle = Content)
#--Compilation---#
normDatReps[[i]][[j]] = rbind(tmp.cyto.repl, tmp.mito.repl)
#Deleting
rm(tmp.ctrl,
tmp.cyto,
tmp.mito,
tmp.cyto.repl,
tmp.mito.repl)
}
}
rm(j)
}
rm(i)
res = lapply(normDatReps, function(x) {
x = do.call("rbind", x)
x = as.data.frame(x)
x$Plate = factor(x$Plate)
x$Organelle = factor(x$Organelle, levels = c("Mitochondria", "Cytoplasm"))
x$Nutrient = factor(x$Nutrient, levels = c("Glucose", "Galactose", "Glycerol"))
rownames(x) = 1:nrow(x)
return(x)
})
res = do.call("rbind", res)
#-------------------------------------------------------------------------------------------------
# Summarizing mutiple mutants (i.e same gene) from the same or different plates
# Also dropping genes with 2 or more NA values
# For genes with just 1 NA value, replacing it with the median of the remaining 3 observed values
#-------------------------------------------------------------------------------------------------
res = res %>%
rowwise() %>%
mutate(Median_roGFP2_ratio = median(c(roGFP2_ratio_1, roGFP2_ratio_2, roGFP2_ratio_3, roGFP2_ratio_4), na.rm=T),
NA_per_row = sum(is.na(c(roGFP2_ratio_1, roGFP2_ratio_2, roGFP2_ratio_3, roGFP2_ratio_4)))) %>%
filter(NA_per_row < 2) %>%
ungroup() %>%
mutate_at(vars(starts_with("roGFO2_ratio_")),
function(x) ifelse(is.na(x), .$Median_roGFP2_ratio, x)) %>%
select(-NA_per_row) %>%
group_by(Organelle, Nutrient, Genes) %>%
top_n(n=1, abs(Median_roGFP2_ratio)) %>%
ungroup()
#-------------------------------------------------------------------------------------------------
# Compiling all replicates across organelles and nutrient conditions into a single matrix
#-------------------------------------------------------------------------------------------------
redoxMat = data.table(res[,c("Genes", "Nutrient", "Organelle", "roGFP2_ratio_1", "roGFP2_ratio_2", "roGFP2_ratio_3", "roGFP2_ratio_4")])
redoxMat = dcast(redoxMat, Genes ~ Organelle + Nutrient, fun.aggregate = function(x){x}, fill=NA,
value.var = c("roGFP2_ratio_1", "roGFP2_ratio_2", "roGFP2_ratio_3", "roGFP2_ratio_4"))
redoxMat = redoxMat[,c(1,
2,8,14,20,
3,9,15,21,
4,10,16,22,
5,11,17,23,
6,12,18,24,
7,13,19,25
)]
redoxMat = data.frame(redoxMat, stringsAsFactors = F )
rownames(redoxMat) = redoxMat$Genes
redoxMat = redoxMat[,-1]
#-------------------------------------------------------------------------------------------------
# Compiling the median values across organelles and nutrient conditions into a single matrix
#-------------------------------------------------------------------------------------------------
redoxMat_median = data.table(res[,c("Genes", "Nutrient", "Organelle", "Median_roGFP2_ratio")])
redoxMat_median = dcast(redoxMat_median, Genes ~ Organelle + Nutrient, fun.aggregate = function(x){x}, fill=NA,
value.var = "Median_roGFP2_ratio")
redoxMat_median = data.frame(redoxMat_median, stringsAsFactors = F )
rownames(redoxMat_median) = redoxMat_median$Genes
redoxMat_median = redoxMat_median[,-1]
#-------------------------------------------------------------------------------------------------
# Putting all the results into one
#-------------------------------------------------------------------------------------------------
res = list(redox_table = res, redox_replicates = redoxMat, redox_median = redoxMat_median)
rm(normDat, normDatReps, redoxMat, redoxMat_median)
return(res)
}
normDat.ctrNorm = normalizeScreenData(normMethod = "fracControl")
normDat.Znorm = normalizeScreenData(normMethod = "robustZ")Normalized data overview
We normalize the data as described above, below is the summary and the distribution of the normalized data -
- For per plate normalization based on plate specific control
Genes Plate Nutrient Organelle
Length:4675 2 : 484 Glucose :1561 Mitochondria:2336
Class :character 4 : 473 Galactose:1549 Cytoplasm :2339
Mode :character 3 : 472 Glycerol :1565
9 : 460
8 : 418
5 : 402
(Other):1966
roGFP2_ratio_1 roGFP2_ratio_2 roGFP2_ratio_3 roGFP2_ratio_4
Min. :0.000226 Min. :0.000228 Min. :0.000214 Min. :0.000228
1st Qu.:1.199315 1st Qu.:1.197896 1st Qu.:1.204309 1st Qu.:1.197828
Median :1.611629 Median :1.606277 Median :1.598560 Median :1.595587
Mean :1.627920 Mean :1.625899 Mean :1.623656 Mean :1.623058
3rd Qu.:2.005188 3rd Qu.:2.000470 3rd Qu.:2.002525 3rd Qu.:2.011491
Max. :4.213349 Max. :4.139418 Max. :3.748545 Max. :4.145865
NA's :8
Median_roGFP2_ratio
Min. :0.000228
1st Qu.:1.202107
Median :1.606897
Mean :1.627360
3rd Qu.:2.008029
Max. :4.032744
- For per plate normalization based on median organelle specific values
Genes Plate Nutrient Organelle
Length:4718 2 : 484 Glucose :1573 Mitochondria:2358
Class :character 4 : 475 Galactose:1572 Cytoplasm :2360
Mode :character 3 : 472 Glycerol :1573
9 : 462
8 : 420
5 : 402
(Other):2003
roGFP2_ratio_1 roGFP2_ratio_2 roGFP2_ratio_3
Min. :-20.312549 Min. :-20.399022 Min. :-22.502049
1st Qu.: -0.683968 1st Qu.: -0.712392 1st Qu.: -0.676880
Median : 0.001691 Median : 0.000000 Median : -0.000439
Mean : 0.006676 Mean : -0.001119 Mean : -0.010808
3rd Qu.: 0.659867 3rd Qu.: 0.675725 3rd Qu.: 0.650951
Max. : 14.312598 Max. : 13.803653 Max. : 14.430594
roGFP2_ratio_4 Median_roGFP2_ratio
Min. :-15.909335 Min. :-18.285617
1st Qu.: -0.701527 1st Qu.: -0.658868
Median : -0.001329 Median : 0.006156
Mean : -0.013836 Mean : 0.008269
3rd Qu.: 0.644119 3rd Qu.: 0.622557
Max. : 13.349369 Max. : 14.312598
NA's :13
- Number of mutants per condition
a = normDat.ctrNorm$redox_table
a = split(a$Genes, paste0(a$Nutrient, a$Organelle))
sapply(a, function(x) length(unique(x))) GalactoseCytoplasm GalactoseMitochondria GlucoseCytoplasm
775 774 781
GlucoseMitochondria GlycerolCytoplasm GlycerolMitochondria
780 783 782
- Similarity between the normalization strategies
plotList = vector("list", 6)
names(plotList) = c("Glucose-Cytoplasm", "Glucose-Mitochondria",
"Galactose-Cytoplasm", "Galactose-Mitochondria",
"Glycerol-Cytoplasm", "Glycerol-Mitochondria")
plotListTop = plotList
for(i in c("Glucose", "Galactose", "Glycerol"))
{
for(j in c("Cytoplasm", "Mitochondria"))
{
a = normDat.Znorm$redox_table[which(normDat.Znorm$redox_table$Nutrient == i & normDat.Znorm$redox_table$Organelle == j),]
a = a[, c("Genes", "Median_roGFP2_ratio")]
colnames(a) = c("Genes", "Znorm")
b = normDat.ctrNorm$redox_table[which(normDat.ctrNorm$redox_table$Nutrient == i & normDat.ctrNorm$redox_table$Organelle == j),]
b = b[, c("Genes", "Median_roGFP2_ratio")]
colnames(b) = c("Genes", "Ctrlnorm")
df = merge(a,b)
topZ = quantile(df$Znorm, probs = c(0.05, 0.95))
topC = quantile(df$Ctrlnorm, probs = c(0.05, 0.95))
df_top = df[which(df$Znorm < topZ[1] | df$Znorm > topZ[2] | df$Ctrlnorm < topC[1] | df$Ctrlnorm > topC[2]),]
id = paste(i,j,sep="-")
plotList[[id]] = ggscatter(df, x = "Znorm", y = "Ctrlnorm",
color = "black", shape = 20, size = 0.5, # Points color, shape and size
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE, # Add confidence interval
cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
cor.coeff.args = list(method = "spearman", label.x = -10, label.y = 1.5, label.sep = "\n"),
ggtheme = theme_classic(base_size = 8)) + labs(subtitle = id)
plotListTop[[id]] = ggscatter(df_top, x = "Znorm", y = "Ctrlnorm",
color = "black", shape = 20, size = 0.5, # Points color, shape and size
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE, # Add confidence interval
cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
cor.coeff.args = list(method = "spearman", label.x = -10, label.y = 1.5, label.sep = "\n"),
ggtheme = theme_classic(base_size = 8)) + labs(subtitle = id)
rm(a, b, df,df_top, topZ, topC, id)
}
rm(j)
}
rm(i)Plotting the correlation between the two normalization techniques for all matching mutants
plotList$`Glucose-Cytoplasm` + plotList$`Glucose-Mitochondria` + plotList$`Galactose-Cytoplasm` +
plotList$`Galactose-Mitochondria` + plotList$`Glycerol-Cytoplasm` + plotList$`Glycerol-Mitochondria` +
plot_layout(nrow = 3, ncol = 2)Plotting the correlation between the two normalization techniques using ONLY the top hits 5% (high/low) roGFP2 ratios from either normalization method
plotListTop$`Glucose-Cytoplasm` + plotListTop$`Glucose-Mitochondria` + plotListTop$`Galactose-Cytoplasm` +
plotListTop$`Galactose-Mitochondria` + plotListTop$`Glycerol-Cytoplasm` + plotListTop$`Glycerol-Mitochondria` +
plot_layout(nrow = 3, ncol = 2)Sample similarity - Dimensionality reduction
Next, we apply the dimensionality reduction method Multidimesional scaling (MDS) on the roGFP2 ratio values normalized by the plate control and Robust Z normalized to identify the major grouping of the yeast mutants based on their redox status across organelle and nutrient conditions. We distinctly see that the redox status is different between the organelles (also seen by density plots, higher in mitochondria compared to cytoplasm) and within the organelles the mutants group by the nutrient conditions.
getMDSdata = function(dat) {
mds = cmdscale(dist(t(dat)), eig = TRUE, k = 2)$points
colnames(mds) = c("MDS1", "MDS2")
col_anno = do.call("rbind", strsplit(rownames(mds), "_"))
col_anno = col_anno[, -c(1:3)]
colnames(col_anno) = c("Compartment", "Nutrient")
mds = data.frame(mds, col_anno)
}
mdsZ = getMDSdata(dat = normDat.Znorm$redox_replicates)
mdsC = getMDSdata(dat = normDat.ctrNorm$redox_replicates)
p1 = ggplot(mdsZ, aes(x = MDS1, y = MDS2)) + theme_classic(base_size = 8) + labs(subtitle = "Robust Z normalized") +
geom_point(aes(color = Nutrient, shape = Compartment), size = 2) + scale_color_manual(values = c("#fc8d62",
"#66c2a5", "#8da0cb"))
p2 = ggplot(mdsC, aes(x = MDS1, y = MDS2)) + theme_classic(base_size = 8) + labs(subtitle = "Plate control normalized") +
geom_point(aes(color = Nutrient, shape = Compartment), size = 2) + scale_color_manual(values = c("#fc8d62",
"#66c2a5", "#8da0cb"))
pA = p1 + p2 + plot_layout(guides = "collect")
p3 <- ggviolin(normDat.Znorm$redox_table, x = "Organelle", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Organelle",
facet.by = "Nutrient", color = "grey90") + stat_compare_means(label = "p.format",
method = "wilcox", cex = 2) + labs(subtitle = "Robust Z normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
scale_fill_manual(values = c("#b2df8a", "#7570b3"))
p4 <- ggviolin(normDat.ctrNorm$redox_table, x = "Organelle", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Organelle",
facet.by = "Nutrient", color = "grey90") + stat_compare_means(label = "p.format",
method = "wilcox", cex = 2) + labs(subtitle = "Plate control normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
scale_fill_manual(values = c("#b2df8a", "#7570b3"))
pB = p3 + p4 + plot_layout(guides = "collect")
p5 = ggviolin(normDat.Znorm$redox_table, x = "Nutrient", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Nutrient",
facet.by = "Organelle", color = "grey90") + stat_compare_means(method = "anova",
label.y = 22, label.x = 1.5, cex = 2) + stat_compare_means(label = "p.signif",
method = "wilcox", ref.group = ".all.", hide.ns = TRUE) + scale_fill_manual(values = c("#66c2a5",
"#fc8d62", "#8da0cb")) + labs(subtitle = "Robust Z normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank())
p6 = ggviolin(normDat.ctrNorm$redox_table, x = "Nutrient", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Nutrient",
facet.by = "Organelle", color = "grey90") + stat_compare_means(method = "anova",
label.y = 5, label.x = 1.5, cex = 2) + stat_compare_means(label = "p.signif",
method = "wilcox", ref.group = ".all.", hide.ns = TRUE) + scale_fill_manual(values = c("#66c2a5",
"#fc8d62", "#8da0cb")) + labs(subtitle = "Plate control normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank())
pC = p5 + p6 + plot_layout(guides = "collect")
p = pA/pB/pC
ggsave(filename = paste0(path, "analysis/normalization/roGFP2_ratio_comparison_nutrient_compartments.pdf"),
plot = p, width = 7, height = 7)
pReplicate similarity - Correlation
Below we show the correlation among replicates.
- first with plate median (nutrient specific) based Robust Z normalization
c1 = cor(normDat.Znorm$redox_replicates, method = "spearman", use = "pairwise.complete.obs")
c2 = cor(normDat.ctrNorm$redox_replicates, method = "spearman", use = "pairwise.complete.obs")
if (identical(colnames(c1), colnames(c2))) {
col_anno = do.call("rbind", strsplit(colnames(c1), "_"))
col_anno = col_anno[, -c(1:3)]
colnames(col_anno) = c("Compartment", "Nutrient")
rownames(col_anno) = colnames(c1)
col_anno = data.frame(col_anno, stringsAsFactors = F)
}
colr = list(Compartment = c(Mitochondria = "#b2df8a", Cytoplasm = "#7570b3"), Nutrient = c(Glucose = "#66c2a5",
Galactose = "#fc8d62", Glycerol = "#8da0cb"))
c1[c1 == 1] = NA
c2[c2 == 1] = NA
pheatmap(c1, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, main = "Robust Z normalize")pheatmap(c1, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, filename = paste0(path,
"analysis/normalization/replicate_correlation_normZnorm.pdf"), width = 5,
height = 5)- second with control based normalization
pheatmap(c2, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, main = "Control normalize")pheatmap(c2, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, filename = paste0(path,
"analysis/normalization/replicate_correlation_normCtr.pdf"), width = 5, height = 5)Distribution of normalized data
- For per plate normalization based on plate specific control and an interactive data table to access the data
p = ggplot(normDat.ctrNorm$redox_table) + theme_bw(base_size = 8) +
geom_boxplot(aes(x = Plate, y = Median_roGFP2_ratio), outlier.size = 0.1, lwd=0.2) + #ylim(-10, 10) +
facet_grid(Nutrient ~ Organelle) +
theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5)) +
geom_text_repel(data = subset(normDat.ctrNorm$redox_table, Median_roGFP2_ratio > 2.5 | Median_roGFP2_ratio < 1), aes(x = Plate, y = Median_roGFP2_ratio, label = Genes), size = 2)
ggsave(filename = paste0(path,"analysis/normalization/roGFP2_ratio_distribution_PlateControl_normalized.pdf"),
plot = p, width = 10, height = 15)
ptmp = normDat.ctrNorm$redox_table
tmp[,5:9] = round(tmp[,5:9],3)
datatable(tmp, rownames = FALSE, filter="top", class="compact",
extensions = c('Buttons') ,
options = list(autoWidth = TRUE,
dom = 'Bfrtip',
buttons = c('csv', 'excel')
))- For per plate normalization based on median organelle specific values and an interactive data table to access the data
p = ggplot(normDat.Znorm$redox_table) + theme_bw(base_size = 9) + #geom_hline(yintercept = c(-5,5)) +
geom_boxplot(aes(x = Plate, y = Median_roGFP2_ratio), outlier.size = 0.1, lwd = 0.2) + #ylim(-10, 10) +
facet_grid(Nutrient ~ Organelle) +
theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5)) +
geom_text_repel(data = subset(normDat.Znorm$redox_table, Median_roGFP2_ratio > 3 | Median_roGFP2_ratio < -3), aes(x = Plate, y = Median_roGFP2_ratio, label = Genes), size = 2)
ggsave(filename = paste0(path,"analysis/normalization/roGFP2_ratio_distribution_RobustZ_normalized.pdf"),
plot = p, width = 10, height = 10)
ptmp = normDat.Znorm$redox_table
tmp[,5:9] = round(tmp[,5:9],3)
datatable(tmp, rownames = FALSE, filter="top", class="compact",
extensions = c('Buttons') ,
options = list(autoWidth = TRUE,
dom = 'Bfrtip',
buttons = c('csv', 'excel')
))Saving the data
Finally we save the normalized data as a .RDS (R data object) and .xlsx excel data file. All our follow up downstream analysis will start from these normalized data.
saveRDS(normDat.ctrNorm, paste0(path, "data/workspaces/YeastMutantRedox_NormalizedData_PlateControl.RDS"))
saveRDS(normDat.Znorm, paste0(path, "data/workspaces/YeastMutantRedox_NormalizedData_RobustZ.RDS"))
WriteXLS(normDat.ctrNorm$redox_table, ExcelFileName = paste0(path, "analysis/supplementary/tables/YeastMutantRedox_NormalizedData_PlateControl.xlsx"),
AdjWidth = TRUE, BoldHeaderRow = TRUE, FreezeRow = 1)
WriteXLS(normDat.Znorm$redox_table, ExcelFileName = paste0(path, "analysis/supplementary/tables/YeastMutantRedox_NormalizedData_RobustZ.xlsx"),
AdjWidth = TRUE, BoldHeaderRow = TRUE, FreezeRow = 1)Session information
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 pheatmap_1.0.12 patchwork_1.0.0 ggpubr_0.2.4
[5] magrittr_1.5 ggrepel_0.8.1 WriteXLS_5.0.0 data.table_1.12.8
[9] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.4 purrr_0.3.3
[13] readr_1.3.1 tidyr_1.0.2 tibble_2.1.3 ggplot2_3.2.1
[17] tidyverse_1.3.0 DT_0.12 rmdformats_0.3.6 knitr_1.28
loaded via a namespace (and not attached):
[1] httr_1.4.1 jsonlite_1.6.1 modelr_0.1.5 shiny_1.4.0
[5] assertthat_0.2.1 cellranger_1.1.0 yaml_2.2.1 pillar_1.4.3
[9] backports_1.1.5 lattice_0.20-38 glue_1.3.1 digest_0.6.23
[13] promises_1.1.0 ggsignif_0.6.0 rvest_0.3.5 colorspace_1.4-1
[17] htmltools_0.4.0 httpuv_1.5.5 plyr_1.8.5 pkgconfig_2.0.3
[21] broom_0.5.4 haven_2.2.0 bookdown_0.17 xtable_1.8-4
[25] scales_1.1.0 later_1.0.0 generics_0.0.2 farver_2.0.3
[29] ellipsis_0.3.0 withr_2.1.2 lazyeval_0.2.2 cli_2.0.1
[33] mime_0.9 crayon_1.3.4 readxl_1.3.1 evaluate_0.14
[37] fs_1.3.1 fansi_0.4.1 nlme_3.1-144 xml2_1.2.2
[41] tools_3.6.2 hms_0.5.3 formatR_1.7 lifecycle_0.1.0
[45] munsell_0.5.0 reprex_0.3.0 compiler_3.6.2 rlang_0.4.4
[49] grid_3.6.2 rstudioapi_0.11 htmlwidgets_1.5.1 crosstalk_1.0.0
[53] labeling_0.3 rmarkdown_2.1 gtable_0.3.0 DBI_1.1.0
[57] reshape2_1.4.3 R6_2.4.1 lubridate_1.7.4 fastmap_1.0.1
[61] stringi_1.4.5 Rcpp_1.0.3 vctrs_0.2.2 dbplyr_1.4.2
[65] tidyselect_1.0.0 xfun_0.12